Layout Analysis based on Text Line Segment Hypotheses
نویسنده
چکیده
This paper describes on-going work in the development of a document layout analysis system based on text line segments. It describes two novel algorithms: gapped text line finding, which can identify text line segments, taking into account per-text line font information for the determination of where text line segments break, and reading order recovery for text line segments using topological sorting. An extension of the approach to a probabilistic maximum likelihood framework is discussed.
منابع مشابه
رفع اعوجاج هندسی متون بهکمک اطلاعات هندسی خطوط متن
Document images produced by scanners or digital cameras usually have photometric and geometric distortions. If either of these effects distorts document, recognition of words from such a document image using OCR is subject to errors. In this paper we propose a novel approach to significantly remove geometric distortion from document images. In this method first we extract document lines from do...
متن کاملDocument Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملNatural Language Inspired Approach for Handwritten Text Line Detection in Legacy Documents
Document layout analysis is an important task needed for handwritten text recognition among other applications. Text layout commonly found in handwritten legacy documents is in the form of one or more paragraphs composed of parallel text lines. An approach for handwritten text line detection is presented which uses machinelearning techniques and methods widely used in natural language processin...
متن کاملLayout Based Information Retrieval from Document Images
This research is intended to develop a layout based retrieval system for document image databases consisting of three phases: 1. At first, intelligent layout analysis algorithm has been designed to extract the layouts the document images physically with their edges and rectangles. 2. Every physically identified layout has been converted into a tree intermediary representation for indexing and s...
متن کاملINVESTIGATION OF BARRIERS AND REQUIREMENTS AFFECTING E-SHOPPING BEHAVIOR OF CUSTOMERS IN THE BOOK MARKET
<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: justify; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; backgro...
متن کامل